121 research outputs found

    GPU accelerated maximum cardinality matching algorithms for bipartite graphs

    Get PDF
    We design, implement, and evaluate GPU-based algorithms for the maximum cardinality matching problem in bipartite graphs. Such algorithms have a variety of applications in computer science, scientific computing, bioinformatics, and other areas. To the best of our knowledge, ours is the first study which focuses on GPU implementation of the maximum cardinality matching algorithms. We compare the proposed algorithms with serial and multicore implementations from the literature on a large set of real-life problems where in majority of the cases one of our GPU-accelerated algorithms is demonstrated to be faster than both the sequential and multicore implementations.Comment: 14 pages, 5 figure

    Combinatorial Problems in High-Performance Computing: Partitioning

    Get PDF
    This extended abstract presents a survey of combinatorial problems encountered in scientific computations on today\u27s high-performance architectures, with sophisticated memory hierarchies, multiple levels of cache, and multiple processors on chip as well as off-chip. For parallelism, the most important problem is to partition sparse matrices, graph, or hypergraphs into nearly equal-sized parts while trying to reduce inter-processor communication. Common approaches to such problems involve multilevel methods based on coarsening and uncoarsening (hyper)graphs, matching of similar vertices, searching for good separator sets and good splittings, dynamical adjustment of load imbalance, and two-dimensional matrix splitting methods

    Parallelization of Mapping Algorithms for Next Generation Sequencing Applications

    Get PDF
    With the advent of next-generation high throughput sequencing instruments, large volumes of short sequence data are generated at an unprecedented rate. Processing and analyzing these massive data requires overcoming several challenges. A particular challenge addressed in this abstract is the mapping of short sequences (reads) to a reference genome by allowing mismatches. This is a significantly time consuming combinatorial problem in many applications including whole-genome resequencing, targeted sequencing, transcriptome/small RNA, DNA methylation and ChiP sequencing, and takes time on the order of days using existing sequential techniques on large scale datasets. In this work, we introduce six parallelization methods each having different scalability characteristics to speedup short sequence mapping. We also address an associated load balancing problem that involves grouping nodes of a tree from different levels. This problem arises due to a trade-off between computational cost and granularity while partitioning the workload. We comparatively present the proposed parallelization methods and give theoretical cost models for each of them. Experimental results on real datasets demonstrate the effectiveness of the methods and indicate that they are successful at reducing the execution time from the order of days to under just a few hours for large datasets. To the best of our knowledge this is the first study on parallelization of short sequence mapping problem

    A Survey of Pipelined Workflow Scheduling: Models and Algorithms

    Get PDF
    International audienceA large class of applications need to execute the same workflow on different data sets of identical size. Efficient execution of such applications necessitates intelligent distribution of the application components and tasks on a parallel machine, and the execution can be orchestrated by utilizing task-, data-, pipelined-, and/or replicated-parallelism. The scheduling problem that encompasses all of these techniques is called pipelined workflow scheduling, and it has been widely studied in the last decade. Multiple models and algorithms have flourished to tackle various programming paradigms, constraints, machine behaviors or optimization goals. This paper surveys the field by summing up and structuring known results and approaches

    Finding the Hierarchy of Dense Subgraphs using Nucleus Decompositions

    Full text link
    Finding dense substructures in a graph is a fundamental graph mining operation, with applications in bioinformatics, social networks, and visualization to name a few. Yet most standard formulations of this problem (like clique, quasiclique, k-densest subgraph) are NP-hard. Furthermore, the goal is rarely to find the "true optimum", but to identify many (if not all) dense substructures, understand their distribution in the graph, and ideally determine relationships among them. Current dense subgraph finding algorithms usually optimize some objective, and only find a few such subgraphs without providing any structural relations. We define the nucleus decomposition of a graph, which represents the graph as a forest of nuclei. Each nucleus is a subgraph where smaller cliques are present in many larger cliques. The forest of nuclei is a hierarchy by containment, where the edge density increases as we proceed towards leaf nuclei. Sibling nuclei can have limited intersections, which enables discovering overlapping dense subgraphs. With the right parameters, the nucleus decomposition generalizes the classic notions of k-cores and k-truss decompositions. We give provably efficient algorithms for nucleus decompositions, and empirically evaluate their behavior in a variety of real graphs. The tree of nuclei consistently gives a global, hierarchical snapshot of dense substructures, and outputs dense subgraphs of higher quality than other state-of-the-art solutions. Our algorithm can process graphs with tens of millions of edges in less than an hour

    MICA: microRNA integration for active module discovery

    Get PDF
    A successful method to address disease-specific module discovery is the integration of the gene expression data with the protein-protein interaction~(PPI) network. Although many algorithms have been developed for this purpose, they focus only on the network genes~(mostly on the well-connected ones); totally neglecting the genes whose interactions are partially or totally not known. In addition, they only make use of the gene expression data which does not give the complete picture about the actual protein expression levels. The cell uses different mechanisms, such as microRNAs, to post-transcriptionally regulate the proteins without affecting the corresponding genes' expressions. Due to this complexity, using a single data type is definitely not the correct way to find the correct module(s). Today, the unprecedented amount of publicly available disease-related heterogeneous data encourages the development of new methodologies to better understand complex diseases. In this work, we propose a novel workflow Mica, which, to the best of our knowledge, is the first study integrating miRNA, mRNA, and PPI information to identify disease-specific gene modules. The novelty of the Mica lies in many directions, such as the early modification of mRNA expression with microRNA to better highlight the indirect dependencies between the genes. We applied Mica on microRNA-Seq and mRNA-Seq data sets of 699699 invasive ductal carcinoma samples and 150150 invasive lobular carcinoma samples from the Cancer Genome Atlas Project~(TCGA). The Mica modules are shown to unravel new and interesting dependencies between the genes. Additionally, the modules accurately differentiate between the case and control samples while being highly enriched with disease-specific pathways and genes
    • …
    corecore